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Abstract 

In many applications, it is important to derive information about the topol- 
ogy and the internal connections of dynamical systems interacting together. 
Examples can be found in fields as diverse as Economics, Neuroscience and 
Biochemistry. The paper deals with the problem of deriving a descriptive 
model of a network, collecting the node outputs as time series with no use of 
a priori insight on the topology, and unveiling an unknown structure as the 
estimate of a "sparse Wiener filter". A geometric interpretation of the prob- 
lem in a pre-Hilbert space for wide-sense stochastic processes is provided. 
We cast the problem as the optimization of a cost function where a set of 
parameters are used to operate a trade-off between accuracy and complexity 
in the final model. The problem of reducing the complexity is addressed by 
fixing a certain degree of sparsity and finding the solution that "better" satis- 
fies the constraints according to the criterion of approximation. Applications 
starting from real data and numerical simulations are provided. 
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1. Introduction 



The interest on networks of dynamical systems is increasing in recent 
years, especially because of their capability of modeling and describing a 
large variety of phenomena and behaviors. Remarkably, while networks of 
dynamical systems are well studied and analyzed in physics 0, Oj], 23 1 and 
engineering (29[ 24 , 26 1. there are fewer results that address the problem 



of reconstructing an unknown dynamical network, since it poses formidable 
theoretical and practical challenges jlij]. However, unraveling the intercon- 
nectedness and the interdependency of a set of processes is of significant 
interest in many fields and the necessity for general tools is rapidly emerg- 
ing (see |27|, 0, liil and the bibliography therein for recent results) . In the 
literature, authors have approached this problem in different ways and with 
various purposes, such as deriving a network topology from just sampled data 



(see e.g. |17l . 1271 122| . |25|) or determining the presence of substructures in the 
networked system (see e.g. (23I . 0Q. The Unweighted Pair Group Method 
with Arithmetic mean (UPGMA) [21] is one of the first techniques proposed 
to reveal an unknown topology. It has found widespread use in the recon- 
struction of phylogenetic trees and is widely employed in other areas such as 
communication systems and resource allocation problems [9]. Another well- 



known technique for the identification of a tree network is developed in [17 
for the analysis of a stock portfolio. The authors identify a tree structure 
according to the following procedure: i) a metric based on the correlation 
index is defined among the nodes; ii) such a metric is employed to extract the 
Minimum Spanning Tree Q which forms the reconstructed topology. How- 
ever, in [13] a severe limit of this strategy is highlighted, where it is shown 
that, even though the actual network is a tree, the presence of dynamical 
connections or delays can lead to the identification of a wrong topology. In 
(2o| a similar strategy, where the correlation metric is replaced by a metric 
based on the coherence function, is numerically shown to provide an exact 



reconstruction for tree topologies. Furthermore, in 18] it is also illustrated 
that a correct reconstruction can be guaranteed for any topology with no 
cycles. 

An approach for the identification of more general topologies is developed in 



the area of Machine Learning for Bayesian dynamical networks |lll . [I0j • In 
this case, however, a massive quantity of data needs to be collected in order 
to accurately evaluate conditional probability distributions. 
In |2j different techniques to quantify and evaluate the modular structure of 
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a network are compared and a new one is proposed trying to combine both 
the topological and dynamic information of the complex system. However, 
the network topology is only qualitatively estimated in terms of "clusters", 
In |27| a method to identify a network of dynamical systems is described. 
However, primary assumptions of the technique are the possibility to manip- 
ulate the input of every single node and to conduct as many experiments as 
needed to detect the link connectivity. 



More recently, in [22] and [19] interesting equivalences between the identifica- 
tion of a dynamical network and a Iq sparsification problem are highlighted, 
suggesting the difficulty of the reconstruction procedure 0,01. 

In this paper, the main idea is to cast the problem of unveiling an un- 
known structure as the estimate of a "sparse Wiener filter". Given a set of 
iV stochastic processes X = {x\, ...,x n }, we consider each Xj as the output 
of an unknown dynamical system, the input of which is given by at most rnj 
stochastic processes {x ail , ...,x a . .} selected from X \ {xA. The choice of 
{x a . ...,x a . ,} is realized according to a criterion that takes into account 

3 i 3 i m j 

the mean square of a modeling error. The parameters mj can be a-priori 
defined, if we intend to impose a certain degree of sparsity on the network 
or a strategy for self-tuning can be introduced penalizing the introduction of 
any additional link, if it does not provide a significant reduction of the cost. 
For any possible choice of {x a i , ...,x a . .}, the computation of the related 

3 1 3 ! 771 j 

Wiener Filter leads to the definition of a modeling error, which is a natural 
way to measure the quality of the description of Xj granted by the time series 
{x a i , ...,x a . m } in terms of predictive/smoothing capability. Once this step 
has been performed, each system is represented by a node of a graph and, 
then, the arcs linking any x ajmk to Xj are introduced for each node Xj. At 
the end of this procedure a graph, modeling the network topology, has been 
obtained. 

We start introducing a pre-Hilbert space for wide-sense stochastic processes, 
where the inner product defines the notion of perpendicularity between two 
stochastic processes. We will show that this way of formulating the prob- 
lem has strong similarities with /o-minimization problems, which have been 
a very active topic of research in Signal Processing during the last few years. 
Indeed, a standard / - mrn i m i z ation problem amounts to finding the "spars- 
est" solution of a set of linear equations in a finite dimension Hilbert space 
With no additional assumptions on the solution, the problem is combi- 
natorially intractable jij]. This has propelled the study of relaxed problems 
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involving, for example, the minimization of the l\ norm, which is a convex 
problem and it is known to provide solutions with at least a certain order of 
sparsity (22|. Unfortunately, we will show that such a relaxation procedure 
is not viable in our formulation. Indeed, it is not possible to define a suitable 
norm in the space of transfer functions that guarantees a certain degree of 
sparsity. For this reason, we resort to some greedy techniques in order to 
find a suboptimal solution with desired sparsity properties. 
The rest of the paper is organized as follows. 

In Section [2] the network topology identification problem is formulated. 
In Section [3] a geometric interpretation and the construction of a pre-Hilbert 
space needed to define a distance and an inner product for stochastic pro- 
cesses is addressed. In Section H] the connection with the compressive sensing 
problem is shown. In Section [5] a greedy algorithm addressing the problem is 
presented along with an alternative approach based on iterated re-weighted 
least squares. Finally, in Section [6] the results obtained by applying the tech- 
niques to numerical data are discussed. In the Appendix most of the needed 
definitions, propositions, lemmas and proofs needed for the construction of 
the pre-Hilbert space are added. 
Notation: 
Z: the integer set; 
M: the real set; 
C: the complex set; 
E[-\. the mean operator; 
( • ) T : the transponse operator. 

2. Problem Formulation 

In this section we provide the main definitions to cast the problem of mod- 
eling a network structure. We consider M stochastic processes x\, . . . ,% 
representing the output of M nodes in a network with an unknown topol- 
ogy. In order to determine the links connecting the nodes, we follow a pro- 
cedure based on estimation techniques. Given a process Xj and a param- 
eter rrij G N, we search for the Mj processes x ai , . . . , x ak , x am with 
otk 7^ Jj VA; = 1, . . . ,rrij, which provide the best estimate of xj according to a 
quadratic criterion. The value rrij is a tuning parameter allowing one to oper- 
ate a trade-off between the sparsity and the accuracy of the model. Thus, rrij 
can be a-priori chosen or, conversely, determined using a self-tuning strategy. 
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We introduce now some definitions and results, which turn out essential 
for the rigorous formulation of the problem. For sake of clarity, we report in 
the Appendix all the additional needed definitions and properties. 

Definition 1. Let ei(t), with i = 1,...,N and t G Z ; be N scalar time- 
discrete, zero-mean, jointly wide-sense stationary random processes in a prob- 
ability space (Q,a,U), where Q is the sample space, a is a sigma algebra on 
Q and II is a probability measure on a. Then, for any t G Z define the vec- 
tor e(t) := (ei(t), ..,eAr(£)) T ; describing a N -dimensional time- discrete, zero- 
mean, wide-sense stationary random process. Moreover, for any fa,t 2 G Z 
denote the (N x N) covariance matrix as 

R e (fa,t 2 ) :=E[e(fa)e T (t 2 )}. (1) 

The entry with i,j G {1, ■■■,N} of R e (fa,t 2 ) is given by 

R ei e j (ti,t 2 ) := E[ei{fa)e 3 {t 2 )}. 

Since any two processes and ej are jointly wide-sense stationary by defini- 
tion, R eiej (fa,t 2 ) only depends on r := t 2 — fa: 

Reieji^ifa — fa) = Raejifaih), Vfa,h £ Z, 

and, thus, R e (fa,t 2 ) too depends only on t 2 — fa, i.e. 

R e {0,t 2 -fa) = R e (fa,t 2 ), Vfa,t 2 eZ. 

Abusing the notation it is possible to write more concisely R e (r) = R e (0,r). 

Definition 2. Consider a vector-valued sequence h(k) G M lxJV with k G Z. 
We define its Z -transform as: 

oo 

H(z) := h (k)z' k , 

k=—oo 

and we assume that the sum converges for any z 6 C such that r\ < \z\ < 
r 2 with ri < 1 < r 2 . Besides, we also assume that any entry of the N- 
dimensional vector H(z) is a real-rational function of z. By the properties 
of the Z-transform, H{z), along with the convergence domain defined by r\ 
and r 2 , uniquely identifies the sequence h(k). Moreover, given a vector of 
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rationally related random processes e(t) := (ei(t), ..,e]y(t)) T , denote for any 
t G Z the random variable 

oo 

y t := h(k)e(t-k). 

k=—oo 

Then, we define by H(z)e the related stochastic process such that 

(H{z)e){t)=y t WteZ. 

Definition 3. Given a N -dimensional time- discrete, zero-mean, wide-sense 
stationary random process e(t) := (ei(i), .., e7v(t)) T , we define its power spec- 
tral density $ e (z) as: 

oo 

* e (z) := Re(r)z- T , 

T = — OO 

having a certain domain of convergence T> C C in the variable z. Denoting 
by $ eiej (z) the entry of$ e (z), it follows that 

oo 
r= — oo 

Besides, if for any i,j G {1, N} the power spectral density $ eiej (z) exists 
on the unit circle \z\ = 1 of the complex plane and it is a real-rational function 
of z, we formally write 

with A(z),B(z) real coefficient polynomials, such that B(z) ^ for any 
z6C, \z\ — 1. In such a case, we say that e is a vector of rationally related 
random processes. 

Finally, let us introduce the following sets: 

J 7 :={W(z)|W(,z) is a real-rational scalar function 
of z G C defined for \z\ = 1} 
r x " :={W(z)\W(z) G C mxn and any 
of its entries is in J 7 }. 
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Problem 4. Consider a set X := {x%, ...,x n } C Te of n rationally related 
processes with zero mean and known (cross)-power spectral densities <& XiXj (z). 
Then, in the above framework the mathematical formulation of the considered 
problem can be stated as follows: 



mm 

,..., aj 



E{x j -J2w j , a . k {z)x a . k \ , (2) 



where every Wj t(X . k (z), with k = l,...,mj, is a possibly non-causal transfer 
function. 

Remark 5. Fixed any set {dj^YkLi; Problem^is immediately solved by a 
multiple input Wiener filter. However, the determination of the parameters 
cij^ makes the problem combinatorial. 



3. A geometric interpretation 

It is possible to give a geometrical interpretation of §2§ by embedding the 
processes xi, . . . ,% in a suitable vector space. This interpretation has the 
main advantage of giving to the Wiener filter the meaning of a projective 
operator in such a space. 

Definition 6. Let e = (e\, ... ) 6n) t be a vector of N rationally related ran- 
dom processes. We define the set Te, as 

Je := {x = H(z)e | H(z) G F lxN } . 

Proposition 7. The ensemble (J-e, +, -, R) is a vector space. 

Proof. Consider X 1 , X 2 , X 3 £ Fe, X\ £ X\,x 2 £ X 2 , x 3 £ X 3 , and «i, a 2 £ 

• Commutativity for the sum 
Consider 



x := x\ + x 2 £ Xi + X 2 . (3) 

Then, since x = x 2 + x±, we also have that x £ X 2 + X\. Since 
(J-e, +, •, R) is a partition, we obtain X\ + X 2 = X 2 + X%. 
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Associativity for the sum 
Consider 



x a := x 1 + (x 2 + x 3 ) e X 1 + (X 2 + X 3 ) 
x b : = (xi + x 2 ) + x 3 e (Xi + X 2 ) + X 3 



(4) 
(5) 
(6) 



Then, since x a = Xb, for the property of a partition set, Xi+(X 2 +X 3 ) = 



Additive identity 

The process Xo(t) = for any t is in J-'e, because the zero transfer 
function is in T . Let the set X G J-'e be the set that constains x . 
Since x\ + Xo = X\ for any x±, we have that X is the identity element 
for the addition. 

Additive inverse 

If Xi G J-e, then also —x\ G J-e, because the transfer function — 1 G T . 
Scalar multiplication identity 

Let x\ be a process in X\. The scalar 1 is the multiplication identity. 
Indeed, we have 



Thus, it holds that 1 • X x = X x . 
Associativity of the scalar multiplication 

Since we have that x = a\{a 2 Xi) = (otia 2 )xi, we also have that 
a\(a 2 Xi) = {a\a 2 )Xi. 

Distribuitivity of the scalar sum 

Since we have {pt.\ + a 2 )xi = a±xi + a 2 x±, we also have (a± + a 2 )Xi = 
ctiXi + a 2 Xi 

Distribuitivity of the vector sum 

Since we have a±(xi + x 2 ) = ol\X\ + a±x 2 , we also have a\(Xi + X 2 ) = 
a\Xi + ol\X 2 . □ 



(X 1 + X 2 ) + x 3 . 



1 • x± — x± G X\. 



(7) 
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Definition 8. For any x G J-e we denote the norm induced by the inner 
product as 



\\x\\ :=< x, x > . 

As shown in details in the Appendix, the set J-e, along with the operation 
< •, • > is a pre-Hilbert space (with the technical assumption that x\ and X2 
are the same processes if x\ ~ x-z). 

We provide an ad-hoc version of the Wiener Filter (guaranteeing that 
the filter will be real rational) with an interpretation in terms of the Hilbert 
projection theorem. Indeed, given signals y, xi, x n G J-e, the Wiener Filter 
estimating y from x := (x±, ...,x n ) can be interpreted as the operator that 
determines the projection of y onto the subspace Tx 

Proposition 9. Let e be a vector of rationally related processes. Let y and 
Xi, x n be processes in the space Te. Define x := (xi, x n ) T and consider 
the problem 

inf \\y- W(z)x\\ 2 . (8) 

If & x (uj) > 0, for all uj G [— 7r, tt], the solution exists, is unique and has the 
form 

W(z) = ^(z)^)- 1 . 
Moreover, for any W'(z) G T x * n x, it holds that 

<y-W(z)x,W'(z)x>=0. (9) 

Proof. Observe that, since q G X, the cost function satisfies 

<Z> yy (u) + W(cu)<Z> xx (cu)W*(uj)+ 

-IT 

- $ xy (uj)W*(uj) - W{u)% x {uj)duj. 

The integral is minimized by minimizing the integrand for all u G [— tt, 7r]. It 
is straightforward to find that the minimum is achieved for 
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Denning the filter W(z) = ^ XXl (z)^ XIXl (z)~ 1 & real-rational transfer matrix 
is obtained with no poles on the unit circle that has the specified frequency 
response. Thus x = W(z)xi G X minimizes the cost (jSJ). Equation ([9]) is 
an immediate consequence of the Hilbert projection theorem (for pre-Hilbert 
spaces) [l5j. 

Problem 10. The mathematical formulation of the problem can be seen now 
as the following: 



mm 

Via, 



ft 



k=l 



(10) 



4. Links with compressive sensing 

In this section we highlight the connections between the problem of mod- 
eling a network topology and the compressive sensing problem. Such a con- 
nection is possible because of the pre-Hilbert structure constructed in Section 
[3] and Appendix. Indeed, the concept of inner product defines a notion of 
"projection" among stochastic processes and makes it possible to seamlessly 
import tools developed for the compressive sensing problem in order to tackle 
that of describing a sparsified topology. 

In the recent few years sparsity problems have attracted the attention of 
researchers in the area of Signal Processing. The reason is mainly due to 
the possibility of representing a signal using only few elements (words) of a 
redundant base (dictionary). Applications are numerous, rangingfrom data- 
compression to high-resolution interpolation, and noise filtering [3], |28| . 
There are many formalizations of the problem, but one of the most common 
is to cast it as 

min ||xo — \l/w||2 subject to ||w||o <m, (11) 

w 

where n < p, xq G 1R p , \P G R pxn is a matrix, whose columns represent a 
redundant base employed to approximate xq and the "zero-norm" (it is not 
actually a norm) 

H| o :=|{ieN|tUi^0}| (12) 

is defined by the number of non-zero entries of a vector w. It can be said that 
w is a "simple" way to express x as a linear combination of the columns of 
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where the concept of "simplicity" is given by a constraint on the number 
of non-zero entries of w. 

For each j = 1, n define the following sets: 

W (i) = {W(z) e T lxn \W 3 (z) = 0} , (13) 

where Wj(z) denotes the j-th component of W(z). For any W E define 
the "zero-norm" as 

||W|| = {# of entries such that 3 z G C, W l {z) ^ 0} 

and define the random vector 

x = (xi, ...,x n ) T . (14) 
Then, the problem (j2J) can be formally cast as 

min \\xj — Wx\\ 2 subject to ||VK||n < m (15) 

which is, from a formal point of view, equivalent to the standard Iq problem 
as defined in (iTTl) . 

5. Solution via suboptimal algorithms 

The problem of modeling network interconnections/complexity reduction 
we have formulated in this paper is equivalent to the problem of determining 
a sparse Wiener filter, as explained in the previous section, once a notion 
of orthogonality is introduced. This formal equivalence shows how deriving 
a suitable topology can immediately inherit a set of practical tools already 
developed in the area of compressing sensing. 

Here we present, as illustrative examples, modifications of algorithms and 
strategies, well-known in the Signal Processing community, which can be 
adopted to obtain suboptimal solutions to the problem of modeling the net- 
work interconnections. 

While formally identical to (fTTj) . the problem of a topology reconstruction 
cast as in (fT5|) still has its own characteristics. Since the "projection" proce- 
dure in f)15p is given by the estimation of a Wiener filter, it is computationally 
more expensive than the standard projection in the space of vectors of real 
numbers. For this reason greedy algorithms offer a good approach to tackle 
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the problem since speed becomes a fundamental factor. Moreover, since the 
complexity of the network model is here one of final goal, greedy algorithms 
are a suitable solution, since they allow one to specify explicitly the connec- 
tion degree rrij of every node Xj. This feature is in general not provided by 
other algorithms. As an alternative approach to greedy algorithms we also 
describe a strategy based on iterated reweighted optimizations as described 




5.1. A modified Orthogonal Least Squares (Cycling OLS) 

Orthogonal Least Squares (OLS) is a greedy algorithm proposed for the 
first time in [6| and in many ways it resembles the algorithm of Matching 
Pursuit developed in (l6| . It basically consists of iterated orthogonal pro- 
jections on elements of a (possibly redundant) base to approximate a given 
vector. For the details of this algorithm we remand the reader to |6j. How- 
ever, for the sake of clarity, we reformulate it in terms of our problem. The 
initialization occurs at the first step setting the set of the chosen elements of 
the dictionary to T^ 1 ' = 0. At the l—th. iteration step, OLS determines the 
term Xj to be added to the reduced dictionary by projecting Xj onto the 
space generated by T^ 1 '^ := T^ 1 ) U{xj} for any i ^ j. Then is defined as 
the r^' 4 ) for which II the smallest and the the algorithm moves to 

the next iteration step. The standard OLS goes on at every step introducing 
a new vector until a stopping condition is met (usually on the norm of the 
residual r k or on the number of iterations). 

We propose an algorithm which derives directly from OLS but it does not in- 
crease the number of vectors x aj k approximating Xj above rrij. The variation 
from OLS is very simple. At any iteration, given the set of vectors T^ 1 ), 
if it already contains rrij vectors, the algorithm chooses a vector in T^ 1 ) to 
be removed and tries to replace it with another vector in order to improve 
the quality of the approximation and updates it. If such an improvement 
is not possible by removing any of the vectors in the current selection, the 
algorithm stops. The implementation can be described using the following 
pseudocode. 

Cycling Orthogonal Least Squares: 

0. define Xq := (null time series) and c = 0. 

1 . initialize the m., -tuple S = (xq, Xq..., Xq) and k = 1 
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2 . while c < rrij 

2a. for % = 1, ri, % ^ j 

define Si as the m^-tuple where Xi replaces the k-th element of S 
and 

define x^ as the projection of Xj on to Si 

2b . a = arg maxj \\Xj — Xa \\ 

2c . if x a = S[k] then c = c + 1 

2d. else S[k] = x a , c = 1, k = k mod rrij, k = k + 1 

3. return S 

The reason of our modification is simple. COLS implements a coordinate 
descent guaranteeing that the number of non-zero components of the solution 
does not exceed rrij. Once such a limit has been reached, it tries to improve 
the quality of the approximation without reducing the sparsity of the current 
solution. 

5.2. Solution via Re-Weighted Least Squares (RWLS) 

Another possible approach to "encourage" sparse solutions is provided by 
reweighted minimization algorithms as proposed in and 0- A comparison 
between reweighted norm-1 and norm-2 methods is performed in |28| . We 
consider only reweighted least squares, because such an algorithm is easier 
to implement, but the intuition behind the two techniques is basically the 
same. 

Using Parseval theorem, Problem [21 can be formulated as 




(16) 



Consider the following convex variation of the problem 




(17) 



subject to 




n 
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where /j,^ 6 E n is a set of weights for the filters Wj^(z). Using the compact 
notation introduced in Section HJ we can equivalently write 



min 1 1 a;,- — Wxll 2 subject to WWW" 2 :, < 1 



(18) 



where, for a vector \i = (/i 



/j, n ), we define 



Let us assume that the a^'s and the relative W a . h solving fl2]) are known. 
Technically, we could set 




if I = otjfi for some k = 1, rrij and \i\ = +oo otherwise. With such a choice 
of weights, the two problems (T5]) and (|18|) would be equivalent since they 
would provide the same solutions. However, Problem (|18|) has the advantage 
of being convex. Of course, the values a^jt are not a-priori known, thus it 
is not possible to evaluate (TL9l) . An iterative approach has been proposed 
making use of the intuition that we have just formulated to estimate the 
weights (HS|) . 

Reweighted Least Squares : 

. For all Xj 

1 . initialize the weight vector \i := 

2 . while a stop criterion is met 
2a. solve the convex problem 



min \\x<j — Wx 
weWj J 



subject to \\Wj\\ 2 < 1 



2b . compute the new weigths 




Wi(u)\\du 
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3. return all the Wj's. 

At any iteration the convex relaxation of the problem is solved and new 
weights are computed as a functions of the current solution. When a stopping 
criterion is met (usually on the number of iterations), the final solution can 
be obtained by selecting the rrij largest entries of each Wj. 

6. Applications and examples 

In this section we report numerical results obtained implementing the 
algorithms described in the previous section. In order to evaluate the perfor- 
mances provided by the two algorithms (COLS and RWLS) we have consid- 
ered a network of 20 nodes as represented in Figure [Tftrue) . In the graph 
every node Nj describes a stochastic process Xj, while every directed arc form 
a node Aj to a node Nj represents a transfer function Hji(z) ^ that has 
been randomly selected from a class of causal FIR filters of order 5. The 
absence of such an arc implies that Hji(z) = 0. Thus, each process Xj follows 
the dynamics 



Every node signal is also implicitly considered affected by an additive white 
Gaussian noise Cj such that the Signal to Noise Ratio (SNR) is 4 (a very 
noisy scenario). All the noise processes are independent from each other. The 
network has been simulated for 2000 steps obtaining 20 time series. The time 
series have been employed to estimate a non-causal FIR approximation of the 
Wiener Filters of order 21 both in the COLS and in the RWLS algorithm. 
In Figure [T] we report the results of the identification. Using a global search 
the global minima of f TTTj) provide the topologies in Figure [TJ (reduced 2) 
and Figure [1] (reduced 3) for the case rrij = 2 and rrij = 3 for each node 
respectively. In Figure [1] (no reduction) we report the topology obtained 
with no constraint on the maximum number of edges: a small threshold has 
been introduced to remove any edge (Aj, Nj) associated with Wji(z) ~ 0. In 
the second row of graphs we have the results for COLS for the cases rrij = 1, 
rrij = 2 and rrij = 3. In Figure [1] (COLS variable) we report the result given 
by the implementation of a strategy to automatically determine the number 
of edges: the number of edges is increased only if gives a reduction of 20% 
of the residual error. In the third row of graphs, we report the analogous 
results for the RWLS algorithm. 




(20) 
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(RWLS 1) (RWLS 2) (RWLS 3) (RWLS variable) 

Figure 1: The actual network topology (true); the topology obtained by a global minimiza- 
tion of (jHJ with rrij = 2 for every node (reduced 2); the topology with rrij = 3 for every 
node (reduced 3); the topology with no constraint on rrij but with a "small" threshold 
imposed on a norm of the Wiener Filters to avoid a complete graph (no reduction); the 
topologies obtained used the COLS suboptimal approach with rrij — 1 (COLS 1), rrij = 2 
(COLS 2), rrij = 3 (COLS 3), and a self-adjusting strategy for rrij (COLS variable): a 
link is introduced if gives at least a reduction of 20% of the residual error; the topologies 
obtained by RWLS after 10 iterations and keeping only the rrij filters with largest norm 
(RWLS 1), (RWLS 2), (RWLS 3), or a self-adjusting strategy (RWLS variable). 
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Country 


A 11 cH"fp li 3 n T)ol \p\v 

A Llo il f ill f ill l—s V ^1 If 11 


ATID 

A KJ IS 


A 11 ClfTP 1 IP 
A llo il dlld 
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1—11 CXZill J-IjCCII 




Rrp 71 1 
J_X1 ClAX 1 


( .PTiflniflri T)o11pt 

V_- C 1 1 1 CX 1. X 1 CX 1 1 J_X V J 1 1 C 1 1 


CAD 


V_y Cll 1 CX 1. 1 CX 


( jiiiripcip Rpninmni 

V_. lllllv . )V J. \XZ 11111111 Ul 


CNY 


I ,11 111 P 
V^lllllCl 


TjQ'nicn TaTOTIP 
l^Clllloll 1VIU11C 


DKK 

lylVlv 


T)pii m p t k 

J_X V X 1 1 1 1 (XI XV 


r jI 1 TO 
i i 111 \j 


FIT 

1—/ \J 


rv InTOTXPPri TTnirvn 

UU.1 UUCClll U 11HJ11 


Wv\\ i o. n Pm lrin 

J_)lliloll J. W I L 1 1 1 1 


(XPR 

V_7 1 XX 


i -.rpa i" Ryxt" a l t l 

V71 CCli J_J1 1 i Cll 11 


\A on cr K fill cr IioIIpt 


HKD 

J. 11V J^/ 


Hnncf k nTicf 

J-J-Wllti XYWllti 


TnniPiri Knnpp 

.lllillClll l.LLLLJC'C^ 


TNR 

±1 ill 


Tri n l p 

XlliXlCX 


I ana n pcp ypti 

O fXUClllCoC J. Cll 


TPY 


1 SI TX Q Tl 
J tXXJClll 


Sion1~n Knrp^ri \A/on 

kJ \J il i 1 1 1VU1 CCXll VVU11 


KRW 

1V1 1 V V 


Soilfri rxOTPP 
kX VX il 1 1 1 IVUiCd 


SiTl Ti^TlKari TllllPP 
kJl 1 i—i f X 1 1 xYf X 1 1 1 UUCC 


LKR 


SiTl TiPTIkP 
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A/Tpvipan PpQn 
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MXN 


ly/r pvi pn 


TYTp. 1 pa/cii pn i n cent 

IV-LCllGl V olCXll 1 ill lei £11 u 


MYR 


lvf P 1 p v^lP 


Norwegian Krone 


NOK 


Norway 


New Zealand Dollar 


NZD 


New Zealand 


Swedish Krona 


SEK 


Sweden 


Singapore Dollar 


SGD 


Singapore 


Thai Baht 


THB 


Thailand 


Taiwanese Dollar 


TWD 


Taiwan 


American Dollar 


USD 


United States of America 


South African Rand 


ZAR 


South Africa 



Table 1: List of the currencies considered in the analysis. 



6.1. An application to real data: currency exchange rates 

In this section we also present the results obtained by applying our tech- 
nique to real data. We have considered the daily exchange rate of 22 selected 
currencies (reported in Tabled]) from the last 7 years providing 1715 samples 
for any of the time series. The missing data (the exchange rate on Satur- 
days and Sundays) have been interpolated (by cubic splines) such that a 
total number of 2400 daily points have been obtained for our analysis. The 
Cycling OLS algorithm has been applied on the logarithmic returns of the 
time series (a standard procedure in Finance,) with order 1,2 e 3 and the 
estimated topologies are depicted in Figure [2] a, Figure [2] b and Figure [2] c. 
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7. Final Remarks 

We have formulated the problem of deriving a link structure from a set 
of time series, obtained by sampling the output of as many interconnected 
dynamical systems. Every time series is represented as a node in a graph 
and their dependencies as connecting edges. The approach we follow in de- 
termining the graph arcs relies on (linear) identification techniques based on 
an ad-hoc version of the Wiener Filter (guaranteeing that the filter will be 
real rational) with an interpretation in terms of the Hilbert projection the- 
orem. If a time series X t turns outs "useful" to model the time series X,, 
then the directed arc is introduced in the graph. In order to modulate 
the complexity of the final graph, a maximum number rrij of arcs pointing at 
Xj is assumed and a cost function is minimized to find the most appropriate 
arcs. The problem has a similar formulation and strong connections with the 
problem of compressing sensing, which has been widely studied in the last 
few years. Such a connection is possible because of the pre-Hilbert structure 
we have constructed. Indeed, the concept of inner product defines a notion of 
"projection" among stochastic processes and makes it possible to seamlessly 
import tools developed for the compressive sensing problem in order to tackle 
the problem of modeling a network topology. 

The problem of topology reconstruction/complexity reduction is equivalent 
to the problem of determining a sparse Wiener filter as explained. However, 
since an optimization problem must be solved for any single node, we consider 
the application of suboptimal solutions. In particular, we introduce a subop- 
timal greedy algorithm obtained as a modification of the Orthogonal Least 
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Squares (COLS) and an alternative approach based on iterated ReWeighted 
Least Squares (RWLS). By the comparison of the two algorithms on numer- 
ical data we have shown the effectiveness of the proposed solutions. Note 
that in the present paper no absolute error metric is provided. 
Future work will investigate some measure criteria to judge the performance 
of the algorithm. For instance, as a starting method we could count the 
correct identified links and the wrong ones, when the underlying topology is 
known, to define the "more accurate" topology and extend such a measure 
of accuracy in some norms to the unknown topology case. 
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8. Appendix 

We provided hereafter the definitions and propositions which are addi- 
tionally needed for the construction of the pre-Hilbert space. 

Definition 11. Given two time- discrete scalar, zero-mean, wide- sense jointly 
stationary random processes Xi(t) and x 2 {t), we write that X\ ~ x 2 if and 
only if, for any t G Z, E[(x 2 (t) — x^t)) 2 ] = 0, that is Xi(t) a = x 2 (t) (that is 
Xi(t) = x 2 (t) almost surely for any t G Z). 

Proposition 12. The relation ~ is an equivalence relation on any set X of 

zero-mean time-discrete wide- sense jointly stationary scalar random processes 
defined on the time domain Z. 

Proposition 13. Let x 1 :— H^\z)e, x 2 := H^ 2 \z)e be two elements of 
Te. Then x±,x 2 are scalar, zero-mean, wide- sense jointly stationary random 
processes with rational power cross-spectral densities having no poles n the 
set {z G C| \z\ = 1}. 

Proof. The processes X\ and x 2 are scalar by the definition of Te. Since 
H^ l \z), H^\z) G J rlxn , they are real-rational and defined on the unit cir- 
cle, and, as a consequence, they admit a unique representation in terms of 
bilateral ^-transform 




(21) 



k=- 



OG 




(22) 



with h£\ hJjp G R lx ™ for k G Z, such that the convergence is guaranteed on 
the unit circle \z\ = 1. 

First, let us evaluate the mean of x±(t), that is 



E[x 1 (t)]=E ^e(t-k) = 



_k=~oo 

oo 



= J] hPE[e(t-k)] = HW(l)E[e(0)] = 0. 
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Thus, it does not depend on the time t. Analogously E[x 2 {t)} = 0. Now, let 
us evaluate the cross-covariance function 

R XlX2 (t,t + r) := E[ Xl (t)x 2 (t + t) t ] = 
= E 



= E 



J2 hV(k)e(t-k)\ £e T (t + r-0(fc (2 W J 

=— oo / \l=— oo 

oo 

E hil) ( k ) e (t - k)e T {t + t- l)(h^(l)f 

=— oo l=— oo 

oo 

= E E h m (k)E[e(t - k)e T (t + r- l)](h^(l)) T = 

fc=— oo l=— oo 

oo oo 

= E E h^(k)R e (r-l + k)(h^(l)) T = R XlX2 (0,r). 



k=—oo l=— oo 



Thus, the cross-covariance does not depend on the time t and, abusing no- 
tation, it is possible to define 

R x1X2 (t) := R XiX2 (0,t) (23) 

Moreover, if we evaluate the bilateral Z-transform of R XlX2 (r), we have 

oo 

®x lX2 {z) ■■= E R xix 2 (r)z~ T = 

T= — OO 

OO oo oo 

= E E E hU(k)R e (T-l + k)(hW(l)) T Z- 



t=— oo k= — oo !=— oo 
J f/ (1) (^)$ e (^)^ (2) (^ 1 )- 



□ 



Proposition 14. Given a rationally related vector e, the set Te is closed 
with respect to addition, transformation by H(z) G T and multiplication 
by scalar a £ Moreover, it holds that, for x\ = H^\z)e G Te and 
x 2 = H {2 \z)eeTe, 

H {1) (z)e + H {2) (z)e = [H {l \z) + H {2 \z)] e 
H{z)[H^\z)e\ = [H{z)H^\z)] e 
a[H {1 \z)e] = [aH^(z)] e. 
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Proof. Let 



H(z) = E h (k)z- k . 



(24) 



k=— co 



• Sum. 

We have 



x 1 (t) + x 2 (t) = 

oo 

E ^^-fc) 



,fe= — CO 

oo 



E *? )e (* - fc ) 

= — CO 

E + ^M* ~k) = + # (2) (*)(*)• 

fc— — CO 

Since [H^(z) + H (2 ^(z)} has no poles on the set {z G C| \z\ = 1}, 
xi + %2 G .Fe. 

Multiplication by G J 7 . 

Since Xi G .Fe, it is a 1 -dimensional rationally related vector. Then, it 
makes sense to compute the random process H(z)x 1 for H(z) G F = 
F 1 



-lxl 



{H{z)x 1 ){t) := E h(k)xi(t — k) — 

h = — co 

CO CO 

= e m E fi%(t-k-i) = 

k=—oo l=— CO 

CO CO 

= e m*o E *£* e (* - o = 



fc=— oo Z=— oo 



E 

i= — CO 



e 



,k=— oo 



e(t - Z) 



= ([i/(,)ij( 1 )(,)] e) (t), 



(25) 

(26) 

(27) 

(28) 
(29) 



where the last equality comes from the properties of the convolution. 
Since H(z)H^(z) has no poles in the set {z G C| \z\ = 1}, H(z)x 1 G 
Tt 
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• Multiplication by scalar a G K. 

It is a special case of the previous property. □ 

Definition 15. We define a scalar binary operation < , - > on Te in the 
following way 

< xx,x 2 >:= #3:1*2(0). 

Proposition 16. The set Te, along with the operation < •,• > is a pre- 
Hilbert space (with the technical assumption that X\ and x 2 are the same 
processes if Xi ~ x 2 ). 

Proof. The proof is left to the reader, making use of the introduced nota- 
tions and properties. □ 
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